Voronoi Methods for Assessing Statistical Dependence

نویسندگان

  • Jonas Mueller
  • David Reshef
چکیده

In this work, we leverage the intuitive geometry of the Voronoi diagram to develop a new class of methods for quantifying and testing for statistical dependence between real-valued random variables. The contributions of this work are threefold: First, we introduce the utility of probability integral transforms of the datapoints to produce Voronoi diagrams which exhibit desirable properties provided by the theory of copula distributions. Second, we propose a novel nonparametric statistical test (VORTI) to determine whether two variables are independent based on the areas of the cells in the Voronoi diagram. Based on similar ideas, we finally introduce a new method (SVORMI) for estimating mutual information which relies on a novel regularization technique we found no mention of in the literature on density estimation via Voronoi tessellation. Through extensive simulation experiments, we demonstrate that our proposed estimator is competitive with the state-of-the-art k-nearest neighbor mutual information estimator of Kraskov et al. (2004). Introduction Detecting various forms of deviations from statistical independence between random variables is a cornerstone of statistics. A wide range of approaches to this problem have been explored, ranging from correlation coefficients based on data values/ranks/distances [Pearson (1896); Hazewinkel (2001); Kendall (1938); Szekely and Rizzo (2009)], Fourier or RKHS representations [Stein and Weiss (1971); Gretton et al. (2005, 2007)], and informationtheoretic tools [Cover and Thomas (2006)]. Each class of methods has its advantages and disadvantages and relies on different assumptions on relationships present in the data. An extremely popular measure of dependence in machine learning is the mutual information (see Definition 1). Intuitively, mutual information (MI) measures the degree to which knowing the value of one variable gives information about another. It has several properties that make it a desirable basis for a measure of dependence: it is non-negative and symmetric, it is invariant under order-preserving transformations of random variables X and Y , and, most importantly, it equals zero precisely when X and Y are statistically independent. These properties—which are not shared by simpler measures of correlation such as Pearson correlation—have led to its widespread use in diverse applications [Elemento et al. (2007)].

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Voronoi Residuals and their Application in Assessing the Fit of Poin Process Models‎: ‎An Applied Study

‎Many point process models have been proposed for studying variety of scientific disciplines‎, ‎including geology‎, ‎medicin‎, ‎astronomy‎, ‎forestry‎, ‎ecology and ect‎. ‎The assessment of fitting these models is important‎. ‎Residuals-based methods are appropriate tools for evaluating good fit of spatial point of process models‎. ‎In this paper‎, ‎first‎, ‎the concepts related to the Voronoi ...

متن کامل

Identity Disturbance and Substance-Dependence in Patients with Borderline Personality Disorder

Abstract Background: Identity disturbance is one of the DSM-IV criteria for borderline personality disorder, but there has been little attention to its nature. Four subsets of identity disturbance (role absorption, painful incoherence, inconsistency and lack of commitment) have been assessed. This study aimed to assess the role of these subsets in patients with borderline person...

متن کامل

Assessing Numerical Dependence in Gene Expression Summaries with the Jackknife Expression Difference

Statistical methods to test for differential expression traditionally assume that each gene's expression summaries are independent across arrays. When certain preprocessing methods are used to obtain those summaries, this assumption is not necessarily true. In general, the erroneous assumption of dependence results in a loss of statistical power. We introduce a diagnostic measure of numerical d...

متن کامل

From symmetry break to Poisson point process in 2D Voronoi tessellations: the generic nature of hexagons

We bridge the properties of the regular square and honeycomb Voronoi tessellations of the plane to those of the Poisson-Voronoi case, thus analyzing in a common framework symmetry-break processes and the approach to uniformly random distributions of tessellation-generating points. We resort to ensemble simulations of tessellations generated by points whose regular positions is perturbed through...

متن کامل

The Ω dependence of the velocity divergence distribution

Analytical studies based on perturbative theory have shown that the moments of the Probability Distribution Function (PDF) of the local smoothed velocity divergence are expected to have a very specific dependence on the density parameter Ω in the quasilinear regime. This dependence is particularly interesting as it does not involve the possible bias between the galaxy spatial distribution and t...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015